Skip to content

[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689

Merged
rkazants merged 229 commits into
huggingface:mainfrom
rkazants:support_qwen3_5
May 5, 2026
Merged

[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689
rkazants merged 229 commits into
huggingface:mainfrom
rkazants:support_qwen3_5

Conversation

@rkazants
Copy link
Copy Markdown
Collaborator

@rkazants rkazants commented Apr 15, 2026

What does this PR do?

Re-created PR #1634

Fixes 181271, 181280, 182003

Installation instructions:

pip install -U git+https://github.com/rkazants/optimum-intel.git@support_qwen3_5
pip install --pre -U openvino openvino-tokenizers nncf --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
pip install transformers==5.2.0
pip install requests torchvision opencv-python

Exporting cmd-line:

optimum-cli export openvino -m Qwen/Qwen3.5-0.8B Qwen3.5-0.8B

Inference script:

from transformers import AutoProcessor
from transformers.video_utils import load_video
from huggingface_hub import hf_hub_download
from optimum.intel.openvino import OVModelForVisualCausalLM

model_dir = "Qwen/Qwen3.5-0.8B"

processor = AutoProcessor.from_pretrained(model_dir)
model = OVModelForVisualCausalLM.from_pretrained(model_dir)

# Prepare video input
video_path = hf_hub_download(
                repo_id="raushan-testing-hf/videos-test",
                filename="sample_demo_1.mp4",
                repo_type="dataset",
            )
input_video, _ = load_video(video_path, num_frames=10, backend="opencv")

messages = [
    {"role": "user", "content": [
        {"type": "video"},
        {"type": "text", "text": "Why is this video funny?"},
    ]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], videos=[input_video], return_tensors="pt")

# Run inference
output_ids = model.generate(**inputs, max_new_tokens=100)
output_text = processor.decode(output_ids[0], skip_special_tokens=True)

print(output_text)

Before submitting

  • [N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • [] Did you write any new necessary tests?

Comment thread optimum/intel/openvino/modeling_visual_language.py Outdated
Comment thread .github/workflows/test_openvino_preview_models.yml Outdated
Comment thread .github/workflows/test_openvino_preview_models.yml Outdated
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
Comment thread optimum/exporters/openvino/model_patcher.py
rkazants added 2 commits May 4, 2026 16:46
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Comment thread tests/openvino/test_export.py Outdated
@rkazants rkazants requested a review from popovaan May 4, 2026 18:39
@droans
Copy link
Copy Markdown

droans commented May 4, 2026

Thanks for adding support!

I've been attempting to test this locally. However, there is an issue with the models generated and/or the OpenVino implementation for Qwen.

I've exported copies of qwen3.5-9b along with the 27B and 35B-A3B versions of both Qwen3.5 and 3.6. These were all exported using the command optimum-cli export openvino -m qwen/qwen3.5-XXX --weight-format int4 /models/qwen3.5-xxxx-int4.

The 9B appears to works fine (outside of enabling/disabling thinking but that's a different issue). The other two, though, are causing major issues.

First, neither of them will load to GPU. When I attempt to load qwen3.6-27b, I receive this error:

Failed to initialize VLMPipeline: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
[GPU] clWaitForEvents, error code: -14 CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
Traceback (most recent call last):
  File "/app/src/engine/ov_genai/vlm.py", line 278, in load_model
    self.model_path = VLMPipeline(
                      ^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
[GPU] clWaitForEvents, error code: -14 CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST

When I attempt to do the same with qwen3.6-35b-a3b, I receive an error that originates from the same call but is slightly different:

Failed to initialize VLMPipeline: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
[GPU] clWaitForEvents, error code: -58 CL_INVALID_EVENT
Traceback (most recent call last):
  File "/app/src/engine/ov_genai/vlm.py", line 278, in load_model
    self.model_path = VLMPipeline(
                      ^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
[GPU] clWaitForEvents, error code: -58 CL_INVALID_EVENT

Secondly, the models are generating gibberish. I'm able to get them to load to my CPU but the response doesn't make any sense:

Prompt: Tell me a joke
Response: <>D*!<*M@0IM#6/8=IQH4&Q.I"JG170O)JP$CLL8QH4Q&HB;2@>C&PF&;5Q=K$2%+.'?1!C!R!IPN(%E!MN(7G8B0B3+E'FA!)1.OM@,P;0+@3,0>-5QQ-@C1;B0A'$KP)BJ?7@JB788LQ:8/!%2?O%#8'E#9;(A65NG+(..L=:&N7".7"=A?B'0D*#=KI><O3J)?CB.=D2#8M-F.I>6.@R6P(%%8(;#45IB6->A68(>20:&PR0IM!!Q?QADB##I!FML';#7.E><.O0801MR6C)7M,6=4&D%;7NEDQ,*CNLO3:!3'*L45.'5OG,()%M2/L)8?*R?EP2F31&9/1K3?3HQRLEP"!D>OH3:@/.(R,"=!B&L>F28A<I++RK+P.2%98R(-7//M9A6(8)!3<O*GFBQ%B&!6D,Q+%BN='ORF61NI0>1J:@?LA=MH6KBP%(9+HPR26-A+P:QOD+CM2)0Q+=K>4LF;B:='3DDIQ.*K+C39'E2PO$FF7N+7F<2>/8+##-B;G?<"?BJH,#D>KG:9>8HL!1+8%RLP9PDH9@DL9#**K!?8ELCH,,QF9?@=)5'MJ#K5:'R,75L.8E8HN5$E1I7A5$R"'1/P2??&AC6G−7BB4−MMF<JL)27=− 
′
 >/QA9=(−3)G>?=6) 
′
 K%"/DH9GQ:6A9.5P30QOHM<:C&C(0=&'1ICPCK'N.<7%N67%42LI7NM(Q(>1DFO@2-$3?HAPR%:>P<%7P@.LJCN,47C7@@IPG"M%-2FOAP;E%EI?&K5&"(7!P7L7%/17RR!JO&?:M0G<-OR>-')?9;,!-8),/5A3J@490,)GAH4#G-(BI-@B":*M9<M#JD25%)8J)"I/2R%E"4-,I8(N6#',--R;9P6RE(I/N2;6@=')))1F(RH3:%#A6JER2:<P+@<L:PEFO(C8',D%DG!#BC%5(')!;I86G8?<8=C>-6+5.<8==%85O-6:DA=)6&OF@11"8L3/N-1%6;:E:FI(&@PD6+BE@B<L=<C#*O@D#?Q)O&<")7:5COMEN@?=8+(;7.#(&B2PI?R0%!8>J8N?B(%4(:01OHN7<%,:I(47QRJ5&@666!1'5,<L@:)"<<'"(J#8/N;O;3AI-P.AN3>6>%RN>3CB1%@G)(N-LOM/:GQNK2M6KBD&&A1,?)>Q0)?883B1#(+G"!AQ5#@R6:M)"%>1M8>'E70E0>J05IO,,E%2"@B4=8.6P;(=17C$GO-'M=&L$3PM;1;>>"FP*/">,6IHI+C$(K06L?779!>I-73,6/=QQ;Q%:88M7O;9=%5RI/B'E*DI@IIOM)!BG@JN0B21&A;J5$5#1DO.2I;&!7DB@OP6;(L6O?&"5P&H),KE?:$2)@RA7&7=R?;O=CG:GR)3/<N$6%6C1KDMM&AEGHMFI"3O 

I can submit submit an issue, provide more information, and/or move this discussion to openvinotoolkit/openvino.genai if necessary.

@rkazants
Copy link
Copy Markdown
Collaborator Author

rkazants commented May 5, 2026

I can submit submit an issue, provide more information, and/or move this discussion to openvinotoolkit/openvino.genai if necessary.

Hi @droans,

Thanks for reporting this. Regarding CPU issue, you probably use the latest OpenVINO nightly build where we have a regression. We anticipate this PR merged: openvinotoolkit/openvino#35640

Regarding GPU, it is a problem on the GPU side.

Can you please create GitHub issue and provide reproducers using optimum-intel API: https://github.com/huggingface/optimum-intel/issues?

Best regards,
Roman

rkazants added 5 commits May 5, 2026 04:51
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
@rkazants rkazants added the openvino-slow Runs OpenVINO slow tests with different versions of transformers label May 5, 2026
@rkazants rkazants merged commit 8ec3275 into huggingface:main May 5, 2026
62 of 80 checks passed
pull Bot pushed a commit to j3din00b/openvino.genai that referenced this pull request May 6, 2026
## Description
This PR enables Qwen3.5 model in VLM pipeline (only SDPA use case),
updates tests and documentation.

Requires huggingface/optimum-intel#1689 for
model export.

Current WWB accuracy results:
```
Optimum vs HF
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_5_0_8b_fp16
INFO:whowhatbench.wwb:   similarity
0    0.990854
```

```
GenAI vs Optimum (default vision preprocessing)
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_5_0_8b_fp16
INFO:whowhatbench.wwb:   similarity
0    0.939989
```

```
GenAI vs Optimum (VISION_PREPROCESS=CPP)
INFO:whowhatbench.wwb:Metrics for model: models/qwen3_5_0_8b_fp16
INFO:whowhatbench.wwb:   similarity
0    0.959576
```


CVS-181273


## Checklist:
- [x] This PR follows [GenAI Contributing
guidelines](https://github.com/openvinotoolkit/openvino.genai?tab=contributing-ov-file#contributing).
- [x] Tests have been updated or added to cover the new code.
- [x] This PR fully addresses the ticket.
- [x] I have made corresponding changes to the documentation.

---------

Co-authored-by: Copilot <copilot@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openvino-slow Runs OpenVINO slow tests with different versions of transformers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants